Discounted approximations of undiscounted stochastic games and Markov decision processes are already poor in the almost deterministic case
نویسندگان
چکیده
It is shown that the discount factor needed to solve an undiscounted mean payoff stochastic game to optimality is exponentially close to 1, even in oneplayer games with a single random node and polynomially bounded rewards and transition probabilities. On the other hand, for the class of the so-called irreducible games with perfect information and a constant number of random nodes, we obtain a pseudo polynomial algorithm using discounts.
منابع مشابه
Accelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملEvery stochastic game with perfect information admits a canonical form
We consider discounted and undiscounted stochastic games with perfect information in the form of a natural BWR-model with positions of three types: VB Black, VW White, VR Random. These BWR-games lie in the complexity class NP∩CoNP and contain the well-known cyclic games (when VR is empty) and Markov decision processes (when VB or VW is empty). We show that the BWR-model is polynomial-time equiv...
متن کاملSecond Order Optimality in Transient and Discounted Markov Decision Chains
Abstract. The article is devoted to second order optimality in Markov decision processes. Attention is primarily focused on the reward variance for discounted models and undiscounted transient models (i.e. where the spectral radius of the transition probability matrix is less then unity). Considering the second order optimality criteria means that in the class of policies maximizing (or minimiz...
متن کاملMarkov Decision Processes and Stochastic Games with Total Effective Payoff a
We consider finite Markov decision processes (MDPs) with undiscounted total effective payoff. We show that there exist uniformly optimal pure stationary strategies that can be computed by solving a polynomial number of linear programs. We apply this result to two-player zero-sum stochastic games with perfect information and undiscounted total effective payoff, and derive the existence of a sadd...
متن کاملDiscounting in Games across Time Scales
We introduce two-level discounted games played by two players on a perfect-information stochastic game graph. The upper level game is a discounted game and the lower level game is an undiscounted reachability game. Two-level games model hierarchical and sequential decision making under uncertainty across different time scales. We show the existence of pure memoryless optimal strategies for both...
متن کامل